Random Effects Models for Network Data

نویسنده

  • Peter D. Hoff
چکیده

One impediment to the statistical analysis of network data has been the difficulty in modeling the dependence among the observations. In the very simple case of binary (0-1) network data, some researchers have parameterized network dependence in terms of exponential family representations. Accurate parameter estimation for such models is quite difficult, and the most commonly used models often display a significant lack of fit. Additionally, such models are generally limited to binary data. In contrast, random effects models have been a widely successful tool in capturing statistical dependence for a variety of data types, and allow for prediction, imputation, and hypothesis testing within a general regression context. We propose novel random effects structures to capture network dependence, which can also provide graphical representations of network structure and variability. 1 Network Dependence Network data typically consist of a set of n nodes and a relational tie yi,j, measured on each ordered pair of nodes i, j = 1, . . . , n. This framework has many applications, including the study of war, trade, the behavior of epidemics, the interconnectedness of the World Wide Web, and telephone calling patterns. It is often of interest to relate each network response yi,j to a possibly pair-specific vector valued predictor variable xi,j . A flexible framework for doing so is the generalized linear model (see, for example McCullagh and Nelder 1983), in which the expected value of the response is modeled as a function of a linear predictor β xi,j , where β is an unknown vector of regression coefficients to be estimated from the data. The ordinary regression model E(yi,j) = β xi,j is perhaps the most commonly used model of this type. A generalized linear model for binary (0-1) data is logistic regression, which relates the expectation of the response to the regression variable via the relation g(E[yi,j]) = β xi,j, where g(p) = log p 1−p . As an example of the use of such statistical models, consider the analysis of strong friendship ties among 13 boys and 14 girls in a sixth-grade classroom, as collected by Hansell (1984). Each student was asked if they liked each other student “a lot”, “some”, or “not much”. A strong friendship tie is considered present if a student likes another student “a lot.” Also recorded is the sex of each student. The data, presented in Figure 1, suggest a general preference for same-sex friendship ties. Of potential interest is statistical estimation of this preference, as well as a confidence interval for its value. One approach for such statistical analysis would be to formulate the logistic regression model g(E[yi,j|xi,j , β]) = β0 + β1xi,j , where xi,j is one if children i and j are of the same sex, and zero otherwise, and β = (β0, β1) are parameters to be estimated. Estimation of regression coefficients β typically proceeds under the assumption that the observations are conditionally independent given β and the xi,j ’s. However, this assumption is often violated by many network datasets. For example, the data on friendship ties display several types of dependence: Within-node dependence: The number of ties sent by each student varies considerably, ranging from 0 to 19 with a mean of 5.8 and a standard deviation of 4.7 (the standard deviation of the number of ties received was 3.2). This node level variability suggests that responses from the same individual are positively dependent, in that the probability that yi,j = 1 (i sends a tie to j), is high if we know yi,k = 1 for lots of other nodes k, and lower if yi,k is mostly zero. More formally, we may wish to have a model in which Pr(yi,j = 1|yi,1, . . . , yi,j−1, yi,j+1, . . . , yi,n) is an increasing function of yi,k, k 6= j.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Parameter Estimation in Spatial Generalized Linear Mixed Models with Skew Gaussian Random Effects using Laplace Approximation

 Spatial generalized linear mixed models are used commonly for modelling non-Gaussian discrete spatial responses. We present an algorithm for parameter estimation of the models using Laplace approximation of likelihood function. In these models, the spatial correlation structure of data is carried out by random effects or latent variables. In most spatial analysis, it is assumed that rando...

متن کامل

Modeling of VANET Technology & Ad-Hoc Routing Protocols Based on High Performance Random Waypoint Models

Today, one of the new technologies in the modern era is Vehicular Ad-hoc Network which has takenenormous attention in the recent years. Because of rapid topology changing and frequent disconnectionmakes it difficult to design an efficient routing protocol for routing data between vehicles, called V2V orvehicle to vehicle communication and vehicle to roadside infrastructure, called V2I. Designin...

متن کامل

مدیریت ریسک اعتباری در نظام بانکی رویکرد مقایسه ای تحلیل پوششی داده ها و شبکه عصبی

This research has been done with the aim of identification of effective factors which influence on credit risk and designing model for estimating credit rating of the companies which have borrowed from a commercial bank in the one-year period by using Data Envelopment Analysis and neural network model and comparison of these two models . For this purpose, the necessary sample data on financial ...

متن کامل

Estimation of Variance Components for Body Weight of Moghani Sheep Using B-Spline Random Regression Models

The aim of the present study was the estimation of (co) variance components and genetic parameters for body weight of Moghani sheep, using random regression models based on B-Splines functions. The data set included 9165 body weight records from 60 to 360 days of age from 2811 Moghani sheep, collected between 1994 to 2013 from Jafar-Abad Animal Research and Breeding Institute, Ardabil province,...

متن کامل

Bayesian Quantile Regression with Adaptive Lasso Penalty for Dynamic Panel Data

‎Dynamic panel data models include the important part of medicine‎, ‎social and economic studies‎. ‎Existence of the lagged dependent variable as an explanatory variable is a sensible trait of these models‎. ‎The estimation problem of these models arises from the correlation between the lagged depended variable and the current disturbance‎. ‎Recently‎, ‎quantile regression to analyze dynamic pa...

متن کامل

Gyroscope Random Drift Modeling, using Neural Networks, Fuzzy Neural and Traditional Time- series Methods

In this paper statistical and time series models are used for determining the random drift of a dynamically Tuned Gyroscope (DTG). This drift is compensated with optimal predictive transfer function. Also nonlinear neural-network and fuzzy-neural models are investigated for prediction and compensation of the random drift. Finally the different models are compared together and their advantages a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003